Data complexity measured by principal graphs

نویسندگان

  • Andrei Yu. Zinovyev
  • Eugenij Moiseevich Mirkes
چکیده

How to measure the complexity of a finite set of vectors embedded in a multidimensional space? This is a non-trivial question which can be approached in many different ways. Here we suggest a set of data complexity measures using universal approximators, principal cubic complexes. Principal cubic complexes generalise the notion of principal manifolds for datasets with nontrivial topologies. The type of the principal cubic complex is determined by its dimension and a grammar of elementary graph transformations. The simplest grammar produces principal trees. We introduce three natural types of data complexity: 1) geometric (deviation of the data’s approximator from some “idealized” configuration, such as deviation from harmonicity); 2) structural (how many elements of a principal graph are needed to approximate the data), and 3) construction complexity (how many applications of elementary graph transformations are needed to construct the principal object starting from the simplest one). We compute these measures for several simulated and real-life data distributions and show them in the “accuracy-complexity” plots, helping to optimize the accuracy/complexity ratio. We discuss various issues connected with measuring data complexity. Software for computing data complexity measures from principal cubic complexes is provided as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Robust principal graphs for data approximation

Revealing hidden geometry and topology in noisy data sets is a challenging task. Elastic principal graphs is a computationally efficient and flexible data approximator based on embedding a graph into the data space and minimizing the energy functional penalizing the deviation of graph nodes both from data points and from pluri-harmonic configuration (generalization of linearity). The structure ...

متن کامل

The effect of knowledge based economic indicators on the countries' economic complexity

Countries’ economic growth and development are significantly dependent on their productive capacity. In this research, we aimed to investigate which components of a knowledge-based economy has a more meaningful role in the production capacity. In order to measure production capacity, we used one of the most up-to-date indexes, the economic complexity index.  The research used data panel consist...

متن کامل

Complexity and approximation ratio of semitotal domination in graphs

A set $S subseteq V(G)$ is a semitotal dominating set of a graph $G$ if it is a dominating set of $G$ andevery vertex in $S$ is within distance 2 of another vertex of $S$. Thesemitotal domination number $gamma_{t2}(G)$ is the minimumcardinality of a semitotal dominating set of $G$.We show that the semitotal domination problem isAPX-complete for bounded-degree graphs, and the semitotal dominatio...

متن کامل

Topographical complexity of multidimensional energy landscapes.

A scheme for visualizing and quantifying the complexity of multidimensional energy landscapes and multiple pathways is presented employing principal component-based disconnectivity graphs and the Shannon entropy of relative "sizes" of superbasins. The principal component-based disconnectivity graphs incorporate a metric relationship between the stationary points of the system, which enable us t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers & Mathematics with Applications

دوره 65  شماره 

صفحات  -

تاریخ انتشار 2013